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Abstract 

This  paper  proposes  an  analytical  method  for  deriving  software  failure  regions,  which 
are  regions  of  the  input  space  that  are  mapped  failures  by  specific  faults.  Previous  studies 
have  used  empirical  rather  than  analytical  approaches  to  derive  failure  regions.  A  manual 
technique  is  presented  and  proven  to  produce  the  necessary  and  sufficient  conditions  of  a 
fault  being  executed  and  leading  to  a  failure.  Semiautomated  tools  to  assist  in  the  manual 
technique  are  discussed,  as  is  the  use  of  failure  regions  in  regression  testing. 
Index  terms:  Software  Testing,  Formal  Models,  Software  Faults,  Software  Failures,  Soft- 
ware Tools 


1      Introduction 

It  is  widely  recognized  that  many  faults  in  installed  software  are  introduced  during  software 
maintenance.  For  example,  a  recent  change  to  the  switching  software  used  in  the  US 
telephone  system  inadvertently  introduced  a  three-line  fault  that  interrupted  service  to 
customers  in  six  states.  [17]  Regression  testing  is  the  process  of  testing  software  to  determine 
if  a  modification  has  introduced  a  fault.  Currently,  there  is  no  consistent  theoretical 
basis  for  the  conduct  of  regression  testing,  forcing  maintainers  to  rerun  all  previous  tests 
(in  the  phone  system  example,  a  13- week  process)  to  determine  if  any  previously-correct 
results  have  been  affected  during  maintenance.  One  reason  for  modifying  software  during 
maintenance  is  to  correct  known  faults  in  the  software.  This  paper  describes  and  verifies 
a  technique  for  predicting  the  domain  of  input  that  produces  results  affected  by  a  specific 
fault  in  a  specific  piece  of  software.  This  technique  is  supported  by  a  set  of  software  tools, 
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also  described  in  this  paper.  Knowledge  of  this  input  domain  focuses  the  testing  needed 
to  determine  if  correction  of  the  fault  corresponding  to  the  domain  has  introduced  further 
faults.  For  convenience  of  presentation,  this  paper  assumes  that  the  software  to  be  analyzed 
is  written  in  an  ALGOL-derived  language. 

In  this  paper,  a  fault  is  an  erroneous  piece  of  program  source  code.  An  error  is  a 
discrepancy  between  a  computed,  observed  or  measured  value  or  condition  and  the  true, 
specified  or  theoretically  correct  value  or  condition,  the  immediate  result  of  the  execution 
of  a  fault.  A  failure  is  the  termination  of  the  ability  of  a  functional  unit  to  perform  its 
required  function  (i.e.,  a  failure  is  either  the  production  of  a  value  that  contradicts  some 
portion  of  the  specification,  or  the  lack  of  production  of  a  specified  value). [1] 

A  software  failure  region  is  the  set  of  all  input  values  that  are  mapped  by  an  individual 
program  fault  onto  any  failure.  Each  fault  in  a  program,  by  definition,  is  associated  with  a 
single  failure  region,  which  may  be  composed  of  several  disjoint  or  overlapping  subregions. 
A  single  failure  region  may  be  associated  with  more  than  one  fault,  these  faults  being 
differentiated  by  varying  effects  on  the  program  results.  Different  faults  may  also  map 
distinct  failure  regions  onto  the  same  set  of  program  failures.  See  figure  1  for  a  diagram  of 
these  failure  region  -  fault  -  set  of  failure  associations. 

This  paper  describes  a  manual  technique  for  analytically  deriving  a  failure  region  from 
a  known  fault  in  a  program's  source  code.  This  technique  views  a  failure  region  as  bounded 
by  the  conjunction  of  three  boolean  conditions.  The  first,  reachability,  states  the  conditions 
that  hold  when  the  program  executes  the  section  of  source  code  containing  the  fault.  The 
second,  generation,  states  the  conditions  that  cause  the  fault  to  produce  an  erroneous 
internal  state  (e.g.,  bad  variable  values).  The  third,  propagation,  states  the  conditions 
under  which  the  erroneous  internal  state  becomes  a  failure. 
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Figure  1:  Associations  between  Failure  Regions,  Faults  and  Failures 


If  software  developers  can  efficiently  determine  a  software  failure  region,  regression 
testing  would  be  simplified.  The  bounding  conditions  of  a  failure  region  are  the  necessary 
and  sufficient  conditions  for  a  failure  due  to  the  associated  fault.  The  regression-testing 
subtask  of  evaluating  if  the  fault  is  fully  corrected  may  be  viewed  as  applying  a  sample 
from  the  failure  region  to  the  program  and  examining  the  results  for  failures.  Instead  of 
simply  repeating  previous  tests,  testers  could  apply  intelligent  sampling  techniques  on  the 
failure  region.  The  regression-testing  subtask  of  evaluating  if  the  fault  correction  has  not 
introduced  new  faults  may  also  be  facilitated  by  knowledge  of  the  failure  region.  This 
latter  question  is  the  subject  of  ongoing  research [6]. 

The  following  section  briefly  reviews  research  that  has  involved  failure  regions  or  con- 
cepts similar  to  failure  regions.  Section  three  describes  a  manual  method  for  analytical 
derivation  of  the  failure  regions.  Section  four  describes  automated  tools  that  have  been  de- 
veloped to  aid  this  analysis.  This  paper  concludes  with  a  discussion  of  specific  applications 
of  failure  regions  to  regression  testing. 

2      Failure  Regions 

Software  failure  regions  are  sets  of  program  inputs,  one  set  for  each  fault  in  the  program 
and  each  element  in  each  set  a  complete  series  of  inputs  that  causes  the  program  to  execute 
the  corresponding  fault  and  produce  a  failure  due  to  the  fault.  These  sets  are  always  finite, 
given  the  limitations  on  representation  in  a  finite  machine,  but  are  often  too  large  for 
tractable  enumeration.  Each  element  of  each  set  may  be  of  complex  structure.  Due  to 
these  considerations,  this  paper  will  denote  failure  regions  by  the  boolean  conditions  that 
identify  their  elements. 

Ammann  and  Knight  first  described  software  failure  regions  in  research  on  software  fault 
tolerance[3].  They  identified  and  characterized  failure  regions  by  repeated  sampling  across 
a  plane  through  the  program  input  space.  They  used  plots  of  the  identified  portions  of  the 


failure  regions  as  a  basis  for  the  proposal  of  a  data- variation  fault-tolerance  technique.  No 
attempt  was  reported  to  identify  the  bounding  conditions  of  the  failure  regions.  Dunham 
and  Finelli  applied  this  sampling  procedure  to  other  software[4],  referring  to  failure  regions 
to  as  'error  crystals'. 

In  parallel  with  this  fault-tolerance-based  interest  in  the  effects  of  known  faults,  testing- 
based  research  has  investigated  conditions  that  would  lead  to  detection  of  suspected  or 
hypothesized  faults[ll,  18,  5,  12].  The  testing  techniques  that  have  been  derived  from  that 
research  seek  to  identify  at  least  one  data  set  that  would  reveal  a  fault,  rather  than  in 
deriving  the  necessary  and  sufficient  conditions  to  identify  all  data  that  would  reveal  the 
fault.  These  two  problems  are  clearly  related,  but  they  are  not  identical,  and  it  is  the  latter 
problem  that  is  the  subject  of  this  paper. 

Voas[16]  investigated  the  dual  of  the  problem  discussed  in  this  paper,  namely  iden- 
tification of  the  portions  of  the  program  text  that  are  most  likely  to  contain  faults  by 
examination  of  where  faults  are  likely  to  be  propagated  or  obscured.  This  probabilistic 
work  has  focussed  principally  on  missing-statement  faults  and  the  insights  from  that  work 
are  currently  under  examination  using  software  failure  regions[6]. 

While  there  has  been  broad  interest  in  the  conditions  that  lead  to  fault  detection,  no 
one  has  previously  proposed  a  technique  to  derive  the  necessary  and  sufficient  conditions 
for  a  fault  to  cause  a  failure.  Such  a  technique  is  the  focus  of  the  next  section. 

3     Manual  Technique  for  Deriving  Failure  Regions 
3.1     Assumptions 

There  are  several  assumptions  that  are  implicit  in  the  failure  regions  analysis  technique. 
These  assumptions  may  limit  the  applicability  of  the  technique  and  are  provided  here  to 
give  context  to  the  technique  description  that  follows. 

The  first  assumption  is  that  the  region  is  to  correspond  to  a  known  fault.   The  region 


analysis  technique  does  not  detect  faults,  but  generates  failure  regions  for  previously- 
detected  faults.  Myers[10]  conjectured  that  the  sorts  of  information  that  appear  in  a 
failure  region's  bounding  conditions  may  be  used  to  detect  other  faults,  but  this  has  yet  to 
be  empirically  determined.  The  next  section  describes  the  characteristics  of  a  fault  used 
in  application  of  the  region  analysis  technique  . 

The  second  assumption  is  that,  while  deriving  the  failure  region  corresponding  to  a 
given  fault,  the  effects  of  other  faults  may  be  ignored.  In  the  early  stages  of  program 
testing,  this  assumption  may  be  quite  restrictive,  and  research  is  anticipated  to  derive 
failure  regions  for  multiple  faults  simultaneously. 

The  third  assumption  is  that  the  analyst  knows  the  loop  effects  for  each  loop  in  the 
program,  whether  iterative  or  recursive.  Specifically,  the  technique  assumes  the  analyst 
can  indicate  what  changes  in  the  local  data  state  of  a  program  execution  are  made  by 
a  loop.  This  may  be  easier  to  define  then  a  loop  invariant.  While  identification  of  loop 
effects  is  impossible  in  the  general  case,  since  not  all  loops  terminate  or  can  be  proved 
to  terminate,  in  many  cases  the  source  code,  design  or  requirements  documentation  will 
readily  provide  this  information. 

The  fourth  assumption  is  that  the  program  under  consideration  is  a  sequential  program 
with  soft-real-time  constraints.  The  program  is  assumed  sequential  to  avoid  consideration 
of  interleaving  and  interference  effects.  It  is  assumed  under  soft-real-time  constraints  to 
allow  analysis  for  excessive  looping  faults,  but  to  avoid  consideration  of  explicit  timing 
values  during  the  analysis.  Research  is  underway  to  extend  the  analysis  technique  to  deal 
with  concurrent  and  hard-real-time  software. 

The  fifth  assumption  is  that  the  virtual  machine  provided  by  the  language  processor, 
hardware  and  operating  system  will  not  differ  from  its  specified  operation  in  a  manner 
that  affects  the  analysis.  In  practice,  this  assumption  means  that  either  the  program  un- 
der analysis  avoids  any  problems  with  the  virtual  machine,  or  that  the  developers  have 
corrected  these  problems  prior  to  application  of  the  region  analysis  technique  on  the  appli- 


cation  software.  It  is  not  assumed  that  the  virtual  machine  is  an  ideal  machine,  nor  that 
it  has  limitless  precision  and  storage,  so  that  the  technique  may  analyze  faults  that  lead 
to  problems  revealed  by  virtual  machine  limits. 

The  last  assumption  is  that  undefined  variables  will  not  be  coincidentally  correct.  We 
introduce  this  assumption  to  avoid  system-specific  probabilistic  arguments  as  to  the  con- 
tents of  memory.  In  practice,  this  assumption  introduces  a  degree  of  overestimation  into  the 
failure  region  analysis,  but  in  this  first  formulation  of  the  failure  region  analysis  technique, 
we  view  that  overestimation  as  tolerable. 

3.2     The  Fault  and  Program  Models 

In  this  paper,  a  fault  is  an  erroneous,  possibly  non-contiguous,  portion  of  the  program 
source  code.  A  fault  may  be  undesirable  extra  statements,  a  miswritten  set  of  statements, 
or  a  lack  of  needed  statements.  A  fault  is  characterized  for  the  region-analysis  technique 
by  three  pieces  of  information.  The  first  is  its  location,  which  is  either  the  statement  most 
closely  enclosing  the  fault  or  the  first  statement  encountered  in  the  course  of  execution  that 
uses  values  affected  by  the  fault.  Of  these  two  alternatives,  the  first  is  most  tractable  in 
analysis,  but  can't  be  defined  for  some  missing-code  faults.  Should  there  be  no  statement 
that  uses  values  affected  by  the  erroneous  code  portion,  then  if  that  section  causes  the  lack 
of  an  output,  the  analysis  treats  it  as  a  self-revealing  fault  (see  section  3.3);  otherwise  if 
that  section  does  not  cause  the  lack  of  an  output,  then  it  is  not  a  fault  to  be  analyzed  by 
this  technique. 

The  second  piece  of  information  used  to  characterize  a  fault  is  the  list  of  variables 
that  the  immediate  execution  of  the  fault  gives  erroneous  values  to,  without  any  interven- 
ing action  by  any  other  section  of  the  source  code.  These  erroneous  values  (henceforth 
called  errors),  are  viewed  as  being  stored  in  atomic  variables,  slices  of  arrays  or  files  with 
known  limits,  individual  fields  of  a  record  structure,  indirect-referenced  data  with  all  aliases 
known,  and  entire  sets,  strings  or  variant  structures.  Should  a  fault  not  produce  an  error, 


but  otherwise  cause  the  program  to  fail,  the  analysis  treats  it  as  a  self- revealing  fault. 

The  third  piece  of  information  used  to  characterize  a  fault  is  the  conditions  under  which 
it  will  generate  an  error.  These  conditions  are  represented  as  a  boolean  expression  on  the 
local  execution  state  of  the  module  containing  the  fault's  location.  The  analysis  should 
include  the  known  limits  on  slices  of  arrays  or  files  in  this  expression. 

More  formally,  then,  the  analysis  technique  uses  a  characterization  of  a  fault  as  a  3- 
tuple: 

<  LF,  VF,  CF  > 

where  Lp  is  the  location  of  the  fault,  Vp  is  the  list  of  variables  that  form  the  error  caused 
by  the  fault  and  Cp  is  the  the  conditions  under  which  the  fault  causes  the  error,  described 
by  a  Boolean  expression.  In  practice,  this  characterization  means  that  the  fault  must  be 
identified  specifically  prior  to  the  analysis,  but  not  necessarily  corrected.  This  character- 
ization is  not  unique,  as  equivalent  Boolean  expressions  to  any  Cp  may  be  constructed 
by  negation  (i.e.,  _i(_,Cf)  =  Cf).  Such  an  equivalence  will  not  change  the  results  of  the 
analysis  technique. 

A  program  is  characterized  by  the  variables,  modules  and  statements  that  are  its  con- 
stituent parts.  Modules  and  variables  are  identified  by  unique  names  (if  necessary,  by 
associating  them  with  their  defining  scope).  Statements  are  identified  by  their  order  of 
occurrence  within  the  program  source  code,  and  may  enclose  subsidiary  statements.  The 
input  and  output  statements  of  the  program  are  explicitly  identified.  Formally,  then,  a 
program  is  characterized  as  a  6-tuple: 

<Vp,RP,MP,SP,Ip,0P> 

where  Vp  is  the  list  of  variables  defined  by  the  program,  Rp  is  a  subset  of  Vp  indicating 
variables  that  are  input,  Mp  the  list  of  modules,  each  having  the  same  structure  as  the 
program,  Sp  is  the  list  of  statements,  Ip  the  subset  of  Sp  that  are  input  statements,  Op 
the  subset  of  Sp  that  are  output  statements.  An  input  statement  is  defined  as  one  where 
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variables  receive  values  from  a  source  external  to  the  program.  An  output  statement  is 
defined  as  one  where  messages  and  variable  values  are  transmitted  to  a  sink  external  to 
the  program.  The  initial  statement  of  a  program  is  not  considered  an  input  statement, 
nor  is  the  final  statement  of  the  program  considered  an  output  statement,  unless  these 
statements  are  members  of  Ip  or  Op,  respectively. 

A  statement  S  is  formally  characterized  as  a  7-tuple: 

<  Ts,  Ds,  Rs,  Us,  Cs,  Es,  E's  > 

Ts  is  the  text  of  the  statement,  exclusive  of  any  enclosed  statements.  Ds  is  a  list  of  the 
variables  that  are  defined  (given  values)  by  the  statement.  Rs  is  a  list  of  the  variables  that 
are  referenced  by  the  statement.  Us  is  a  list  of  the  variables  that  become  undefined  (lose 
their  value)  at  statement  S.  Cs  is  the  portion  of  Ts  that  forms  a  boolean  condition.  Es  and 
E's  are  lists  of  statements  enclosed  by  the  statement  and  executed  if  Cs  evaluates  to  true 
or  false,  respectively.  Any  or  all  of  Ds,  Rs,  Es  and  E's  may  be  the  empty  list.  Should  the 
statement  not  contain  a  boolean  condition,  Cs  will  be  'true'.  A  path  is  a  set  of  statements, 
a  subset  of  the  union  of  all  statements  in  the  program  and  its  submodules.  The  elements 
of  a  path  are  all  statements  visited  during  some  specific  portion  of  a  program  execution, 
and  some  elements  of  a  path  may  also  appear  as  substatements  of  other  elements  of  the 
path. 

Lastly,  the  type  BooleanExpression,  used  in  the  sections  following,  is  an  expression 
in  the  first-order  predicate  calculus,  maintained  as  an  expression  rather  than  being  reduced 
to  a  Boolean  value. 

3.3     Overview  of  the  Analysis  Technique 

This  paper  describes  the  failure  regions  analysis  technique  using  a  pseudocode  similar  to 
Ada.  This  description  is  for  compactness  and  formality,  rather  than  for  implication  of 
automatic  implementation  of  the  analysis  technique. 


In  a  broad  view,  the  analysis  technique  is  made  up  of  four  steps:  1)  deriving  reachability 

conditions,  2)  merging  error-generation  conditions  obtained  during  fault  identification,  3) 

deriving  propagation  conditions  and  4)  post-processing  to  simplify  the  region  conditions 

and  eliminate  non-input  variables.  A  process-program  arrangement  of  these  steps  appears 

in  Figure  2.    A  self- revealing  fault,  one  that  always  causes  an  immediately  visible  result 

that  contradicts  the  specification  (e.g.,  an  abnormal  program  exit)  or  that  always  prevents 

a  specified  result,  omits  the  analysis  of  error  propagation  conditions  (i.e.,  its  propagation 

condition  is  'true'). 

procedure  DeriveRegion  (in  F: Fault;  in  P: Program; 

out  region : BooleanExpression)  is 
r:  BooleanExpression  :=  'true' 
q:  BooleanExpression  :=  'true' 
for  each  i  6  Ip   loop 

for  each  path  t    from  i   to  LF   loop 

r  :=  r  V  Reach  (t,LF) 
end  loop 
end  loop 

r  :=  r  A  CV  —   obtained  from   debugging 
if  F  is   not    a   self-revealing  fault   then 
for  each  o  G  Op   loop 

for  each  path  t   from  LF   to  o   loop 

q  :=  q  V  Propagate  (F,  t,  o) 
end  loop 
end  loop 
end  if 

r  :=  Reduce (r  A  q,  Vp  —  Rp) 
region  :=  Simplify (r) 
end  DeriveRegion 

Figure  2:  Top-Level  Description  of  Region  Analysis 

The  technique  requires  the  user  to  select  paths  for  the  derivation  of  the  reachability  and 
propagation  conditions.  The  assumptions  of  soft-real-time  program  character  and  isolation 
of  faults  combine  to  ensure  that  the  analysis  will  terminate  (i.e.,  that  it  need  not  consider 
infinite  or  excessive  looping  prior  to  the  fault  location). 
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3.4     Analysis  for  Fault  Reachability  Conditions 

The  first  analysis  step  derives  the  fault's  reachability  conditions  along  a  specific  path. 
The  analysis  invokes  this  step  iteratively,  as  described  in  figure  2.  This  step  forms  a 
composition  of  all  the  conditions  connecting  one  statement  in  the  path  to  another.  One 
group  of  conditions  occur  in  the  text  of  loop,  if,  and  multiple-branch  conditional  (case) 
statements.  This  group  will  henceforth  be  called  referencing  clauses. 

Another  group  of  conditions,  henceforth  called  defining  clauses,  occur  due  to  the  effect  of 
assignment  statements  and  procedure  or  function  calls.  These  clauses  represent  the  changes 
in  variable  values  that  occur  during  program  execution.  The  analysis  uses  defining  clauses 
to  record  transformation  of  the  program  state,  so  that  the  analyst  may  change  conditions 
that  occur  in  the  program  (involving  a  mix  of  input  values,  constants  and  current  values  of 
local  variables)  to  conditions  that  involve  only  input  values  and  constants.  It  is  the  latter 
conditions  that  form  the  desired  bounds  on  the  failure  region.  Most  defining  clauses  will 
not  appear  in  the  final  failure  region  bounds,  but  only  in  the  intermediate  forms  leading 
to  those  final  bounds. 

The  reachability  condition  analysis,  then,  traverses  the  statements  in  the  path,  form- 
ing a  composition  of  the  defining  and  referencing  clauses.  This  traversal  is  described  in 
figure  3.  The  formation  of  the  composition  is  guided  by  the  known  location  of  the  fault, 
and  simplified  to  deal  with  that  specific  set  of  code.  Loops  in  the  path  are  dealt  with 
by  applying  the  exit  condition  (if  the  loop  body  is  not  in  the  path)  or  applying  the  loop 
effects  (which  will  be  represented  by  Effect  (s,c)  below,  where  s  is  a  loop  statement 
operating  in  a  state  characterized  by  c)  and  then  extracting  the  loop  body  from  the  path. 
If  the  fault  location  is  part  of  the  loop  body,  then  the  traversal  algorithm  applies  the  loop 
effects  to  emulate  any  iterations  prior  to  the  fault  and  extracts  any  paths  through  the 
loop  body  except  from  the  loop  initiation  to  the  fault  location.  Other  control  statements 
are  dealt  with  by  incorporating  their  enclosed  conditions  in  the  reachability  condition  and 
traversing  their  enclosed  statements  (if  any).  Non-control  statements  are  dealt  with  using 
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the  Semantics  operation. 

function  Reach    (in   £:path;    in  g: statement) 

returns   BooleanExpression   is 
cur:    BooleanExpression    :=    'true' 
flat:    set   of   statements    :=  0 
flatn:    set   of   statements    :=   0 
for  each   statement   set  loop 
flat    :=  flatten  {E9) 
flatn    :=  flatten  (E'a) 

if  s   is   a   loop  A   Es  C  t  A  g  G  flat   then 
cur    :=   cur   A   Effect  (s,  cur)  A  C3 
t    :=   (i-flat)  U  {s'  \  s'  G  flatA  g  G  flatten  (s')} 
elseif   s    is    a    loop   A   g  G  flat   then 

cur    :=   cur   A   C3 
elseif  s    is    a    loop   A   Es  C t   then 
cur    :=    cur   A    Ef feet  (s,  cur) 
t    :=   t-Ea 
elseif  5    is    a    loop   A   Ea  (£  t   then 

cur    :=   cur   A    ->C9 
elseif   E3  ±  0A(£a  C  tM  g  G  flat)    then 

cur    :=    cur   A    C3 
elseif   E'a  y£  ®  A  {E'3  C  t  V  g  6  flatn)    then 

cur    :=   cur   A   ->C3 
else   cur    :=   Reduce  (cur,    D3)    A    Semantics  (T3,  cur) 
end   if 
end  loop 

return   Simplify (cur) 
end  Reach 

Figure  3:  Description  of  Reachability  Analysis 


The  derivation  of  the  reachability  and  propagation  conditions  make  use  of  an  operation 
referred  to  as  Semantics  and  an  inclusion-expanding  function  flatten.  Semantics 
represents  the  operation  of  the  underlying  virtual  machine.  The  analyst  is  assumed  to 
take  the  text  of  a  statement  and  a  context  for  evaluation  to  obtain  a  Boolean  expression 
describing  the  results  of  evaluation.  Methods  for  implementing  Semantics  in  practice 
include  symbolic  execution^  and  code  reading[9]. 

The  function  flatten  takes  a  list  of  statements  and  returns  the  union  of  that  list  with 
all  of  the  statements  enclosed  by  any  element  of  that  list.  The  algorithm  for  flatten  is 
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given  in  figure  4. 


function  flatten  (in   L:    set   of    statements; 

returns    set   of   statements   is 
cur:    set   of   statements    :=   0 
working:    set   of   statements    :=   L 
while  working  ^  0   loop 
s  6  working 

working    :=    (working—  {s})  U  Ea  U  E's 
cur    :=   curU£sU£^U  {s} 
end  loop 
return   cur 
end  flatten 

Figure  4:     Flatten  Algorithm 


3.5      Analysis  for  Error  Propagation  Conditions 

After  the  analysis  of  reachability  conditions  and  the  merger  of  the  error  generation  con- 
ditions that  were  obtained  during  fault  identification,  the  next  step  in  the  failure  region 
derivation  is  to  determine  when  (if  ever)  the  value  of  each  output  is  affected  by  the  fault. 
Figure  5  provides  an  overview  of  the  error-propagation  analysis. 

The  starting  point  for  this  analysis  is  a  list  of  variables  with  erroneous  values  that  are 
immediate  results  of  the  fault  (denoted  by  Vp  in  figure  5).  This  list  forms  the  initial  'con- 
tamination list'  for  the  analysis.  The  analysis  propagates  the  contamination  list  separately 
along  each  possible  path  from  the  location  of  the  fault  to  each  program  output.  In  each 
specific  path,  the  analysis  expands  the  contamination  list  as  variables  receive  values  de- 
rived from  one  or  more  variables  in  the  contamination  list,  the  analysis  removes  variables 
from  the  contamination  list  as  they  become  undefined  or  as  program  statements  give  them 
new  values  that  do  not  derive  from  variables  in  the  contamination  list.  If  the  last  output 
statement  in  the  path  references  variables  in  the  contamination  list,  then  the  analysis  re- 
turns the  path  condition  as  a  partial  failure  region  condition.  If  the  output  statement  does 
not  reference  any  of  the  variables  in  the  contamination  list,  or  if  the  contamination  list 
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becomes  empty,  then  the  path  is  not  incorporated  in  the  failure  region  conditions. 

3.6  Post-Processing  of  Analysis 

To  be  a  practical  source  of  information  for  testing,  the  bounding  conditions  of  a  software 
failure  region  must  be  simplified  as  much  as  feasible  without  loss  of  information  and  the 
conditions  must  involve  only  variables  that  are  of  interest  to  the  testers,  typically  the  pro- 
gram input  variables.  There  are  two  operations  that  are  used  to  convert  the  conjunction 
of  conditions  derived  from  the  program  text  into  a  useful  failure  region  bound.  The  Sim- 
plify operation  is  a  conventional  Boolean  simplification.  The  Reduce  operation  takes  a 
Boolean  condition  (involving  both  defining  and  referencing  clauses)  and  a  list  of  variables. 
It  returns  a  Boolean  condition  in  which  each  reference  to  any  of  the  variables  on  the  list 
has  been  replaced  with  the  associated  definition,  and  any  definition  of  the  variables  on  the 
list  has  been  eliminated. 

The  preceding  description  of  the  Simplify  and  Reduce  operations  does  not  imply 
that  these  operations  are  to  be  fully  automated.  In  practice,  the  analyst  may  need  to 
intervene  to  replace  complex  or  intertwined  clauses  with  simpler  forms  derived  from  the 
program  specification,  design  or  source  code.  This  has  been  found  to  be  particularly  true 
when  the  input  data  control  the  values  of  program  variables  only  in  an  indirect  fashion. 

3.7  Example  of  Deriving  A  Software  Failure  Region 

As  an  example  of  deriving  failure  regions  from  the  source  code,  consider  the  Pascal  function 
AcuteAngle  in  figure  6.  This  function  is  specified  to  determine  if  the  angle  subtended 
by  a  rectangle  centered  at  an  arbitrary  point  (Xt,Yt)  when  viewed  from  the  origin  (i.e., 
the  largest  angle  with  the  vertex  at  the  origin  and  two  of  the  corners  of  the  rectangle  as 
the  endpoints)  is  less  than  |  radians.  The  width  (w)  and  length  (/)  of  the  rectangle  are 
parallel  to  the  x  and  y  axes,  respectively.  (See  figure  7  for  an  illustration  of  a  rectangle 
and  origin  layout,  where  a  is  the  angle  to  be  measured.)    This  function  is  derived  from 
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function  Propagate  (in  F:  Fault;  in  t:path; 

in  o: statement)  returns  BooleanExpression  is 
cur:  BooleanExpression  :=  'true' 
contamlist:  list  of  variables  :=  VF 
flat:  set  of  statements  :=  0 
flatn:  set  of  statements  :=  0 
for  each  statement  s  E  t   loop 
if  contamlist  is  null  then 
cur  :=  false 
exit  loop 
end  if 

flat  :=  flatten  (JSa) 
flatn  :=  flatten  (E'a) 

if  s   is  a  loop  AE3  C  t  A  LF  E  flat  then 
cur  :=  curAEf  feet  (s,  cur)  A  C3 
t    :=  (*-fiat)U{s'  \s'  £  flat  ALF  E  flat  ten  (s')} 
elseif  5  is  a  loop  ALF  E  flat  then 

cur  :=  curACj 
elseif  5  is  a  loop  AES  C  t   then 
cur  :=  curA  Effect  (5,  cur) 
t    :=   t-E3 
elseif  5  is  a  loop  AEa  ^  t   then 

cur  :=  curA-iC, 
elseif  Es  ^  0  A  {Ea  C  t  V  LF  E  flat )  then 

cur  :=  curAC, 
elseif  E'a  ^  0  A  (£;  C  t\/LF  E  flat)  then 

cur  :=  curA->C3 
else 

cur  :=  Reduce  (cur,  D3)    A    Semantics  (Ts,  cur) 
end  if 
if  contamlistni?3  ^  0  then 

contamlist  :=  contamlistU.D, 
elseif  contamlistfli),  ^  0  then 

contamlist  :=  contamlist  -  D9 
end  if 

contamlist  :=  contamlist  -  U3 
end  loop 

if  contamlistni?0  =  0  then  return  false 
else  return  Simplify (cur) 
end  if 
end  Propagate 

Figure  5:  Description  of  Propagation  Analysis 
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code  developed  for  a  software  experiment[15].  The  particular  implementation  in  figure  6 
has  a  fault  at  line  13,  where  the  corners  used  to  calculate  the  subtended  angle  are  those 
identified  as  B  and  C  in  figure  8,  rather  than  the  proper  B  and  D.  Assume  that  the  only 
input  statement  is  at  line  1  and  the  only  output  statement  is  at  line  29.  The  conditions 
for  reaching  line  13  are: 

XA  =  XT  -  f  A  YA  =  YT  +  i  A 

XB  =  XT  +  f  AYB  =  YT+{A 

Xc  =  XT  -  f  A  Yc  =  YT  -  L  A 

XD  =  XT+^  AYD  =  YT-  |A 

YD  <  0  A  YB  >  0  A  XD  <  0 
The  first  four  lines  are  defining  clauses,  the  last  line  a  referencing  clause. 

The  conditions  under  which  the  code  produces  an  erroneous  value  due  to  the  substitu- 
tion of  C  for  D  are  those  where  the  angle  formed  by  B,  C,  and  the  origin  (angle  COB)  is 
different  from  B,  D  and  the  origin  (angle  DOB),  or: 

Xc       Xd 
which,  since  Yc  =  Yd,  simplifies  to  Xq  >  Xc-,  which  reduces  to  w  >  0.     Lastly,  the 
conditions  under  which  the  erroneous  value  isn't  masked  by  the  comparison  at  line  29  in 
figure  6  are  where  angle  COB  is  less  than  |  and  angle  DOB  is  greater  than  |,  or: 

(tan-(|L)  _  tan-'(g))  <  \  <  (tan-(g)  -  tan"1^)) 

Combining  these  inequalities,  substituting  for  defining  clauses  and  simplifying  yields  the 
result: 

(^  <  YT  <  |)  A  XT  >  =f  A  w  >  0A 
( tan-1(  j^Jr)  —  tan"1^-^!)  j  <  f  <  ( tan-1(  XT+l )  —  tan~l(A:r~i 

A  cross-section  of  this  region  appears  as  the  cross-hatched  portion  of  figure  8  for  a  particular 
set  of  values  for  Xt,  Yj,  w  and  /.  As  we  vary  those  four  values,  the  failure  region  forms  a 
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tetrahedral  cylinder  through  the  four-dimensional  input  space,  with  a  rotating  tetrahedral 

groove  through  one  face. 

function  AcuteAngle (XT, YT, W, L: real) :boolean; 
const  Pi=3. 1415926535; 

var  AX, AY,  BX,BY,  CX,CY,  DX,DY:  real; 
Angle,  HalfWidth,  HalfLength:  real; 
begin 
1  HalfWidth  :=  W  /  2;  HalfLength  :=  L  /  2; 


2  AX 

3  DX 

4  BX 

5  CX 


=  XT-HalfWidth;  AY 

=  XT+HalfWidth;  DY 

=  DX;  BY 

=  AX;  CY 


=YT+HalfLength; 
=YT-HalfLength; 
=AY; 


=DY; 

6  if  ( (YO  >  BY)  and  (XO  >  BX) )  then 

7  Angle  :=  abs  (arctan  (DY/DX)  -  arctan (AY/AX) ) 

8  else  if  ( (XO  <  CX)  and  (YO  <  CY) )  then 

9  Angle  :=  abs (arctan (AY/AX)  -  arctan (DY/DX) ) 

10  else  if  ( (YO  >  AY)  and  (XO  <  AX))  then 

11  Angle  :=  abs  (arctan (CY/CX)  -  arctan (BY/BX) ) 

12  else  if  ( (YO  >  DY)  and  (XO  >  DX) )  then 

13  Angle  :=  abs (arctan (BY/BX)  -  arctan (CY/CX) ) 

14  else  if  XO  >  BX  then 

15  Angle  :=  abs (arctan (DY/DX)  +  arctan (BY/BX) ) 

16  else  if  XO  <  AX  then 

11  Angle  :=  abs (arctan (AY/AX)  +  arctan (CY/CX) ) 

18  else  if  (YO  >=  AY)  then 

19  if  (XO  =  AX)  then  Angle  :=  Pi/2  -  abs (arctan (BY/BX) ) 

20  else  if  (XO  <  BX)  then 

21  Angle  :=  Pi  -  abs (arctan (AY/AX) )  -  abs (arctan (BY/BX) 

22  else  Angle  :=  Pi/2  -  abs  (arctan  (AY/AX) ) 

23  else  if  (YO  <=  CY)  then 

24  if  (XO  =  DX)  then  Angle  :=  Pi/2  -  abs (arctan (CY/CX) ) 

25  else  if  (XO  >  CX)  then 

26  Angle  :=  Pi  -  abs (arctan (DY/DX) )  -  abs (arctan (CY/CX) 

27  else  Angle  :=  Pi/2  -  abs (arctan (DY/DX) ) 

28  else   Angle  :=  2  *  Pi; 

29  AcuteAngle  :=  Angle  <=  (Pi/2) ; 
end; 

Figure  6:  Function  AcuteAngle 
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Figure  7:  Parameter  Layout  for  Function  Acute  Angle 
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Figure  8:  Failure  Region  in  Function  AcuteAngle 


3.8      Validation  of  the  Analysis  Technique 

Section  1  defined  a  failure  region  as  bounded  by  the  necessary  and  sufficient  conditions  for 
a  fault  being  executed  and  leading  to  a  failure.  This  section  proves  the  following  theorem: 

Theorem  1  Given  the  assumptions  in  section  3.1  and  that  the  fault  detection  process 
accurately  supplies  Lp,  Vp  and  Cf,  and  that  the  Semantics  function  accurately  supplies 
the  effect  of  its  argument  statement  on  its  argument  environment,  DeriveRegion,  if 
it  terminates,  will  produce  necessary  and  sufficient  conditions  for  executing  a  fault  and 
causing  it  to  lead  to  any  failure. 

This  theorem  is  the  conjunction  of  the  following  two  propositions: 

Proposition  1  (Necessity)  All  input  sequences  that  satisfy  the  conditions  produced  by 
DeriveRegion  will  cause  the  fault  to  be  executed  and  lead  to  some  failure. 

Proposition  2  (Sufficiency)  There  are  no  input  sequences  that  fail  to  satisfy  the  con- 
ditions produced  by  DeriveRegion  and  cause  the  fault  to  be  executed  and  lead  to  any 
failure. 

Both  of  these  propositions  may  be  conveniently  proved  by  contradiction.  The  negation 
of  Proposition  1  assumes  that  some  input  satisfies  the  conditions  generated  by  DeriveRe- 
gion but  does  not  execute  the  fault  and  lead  to  a  failure.  This  may  only  happen  due  to 
one  or  more  of  the  following: 

nl  The  input  does  not  cause  the  code  at  the  fault  location  to  be  executed. 

n2  The  input  does  not  cause  the  fault  to  generate  an  error. 

n3  The  input  does  not  execute  a  path  from  the  fault  to  any  output  statement. 

n4  The  input  maps  the  error  generated  by  the  fault  into  a  correct  state  along  the 
path  to  each  output  statement. 
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Option  nl  may  occur  iff  at  some  statement  prior  to  Lp,  the  condition  on  the  branch 
that  leads  to  Lp  was  false  in  the  state  derived  from  the  input.  That  implies  that  there 
exists  some  input  statement  i  G  lp  such  that  a  path  from  i  to  Lp  exists,  but  that  path  is 
not  included  in  the  derivation  of  failure  region  bounds,  or  the  conditions  along  the  path 
were  incorrectly  analyzed  by  Reach.  The  implication  that  the  path  was  not  included  is  in 
conflict  with  the  'all  paths  from  i  to  Lp1  loop  in  DeriveRegion. 

The  implication  that  the  conditions  along  a  path  were  incorrectly  analyzed  by  Reach 
implies  two  possibilities:  the  referencing  clauses  were  incorrectly  analyzed  or  the  defining 
clauses  were  incorrectly  analyzed.  The  possibility  that  the  defining  clauses  were  incorrectly 
analyzed  is  a  contradiction  of  the  assumption  that  Semantics  accurately  captures  the 
effects  of  the  statement.  The  possibility  that  the  referencing  clauses  were  incorrectly  an- 
alyzed is  shown  a  contradiction  by  examination  of  the  condition  handling  in  Reach  (the 
if  structure  inside  of  the  for  each  loop). 

In  any  path  through  the  text  of  a  program,  there  are  three  possible  traversals  of  a 
single-entry,  single-exit  section  of  the  text  :  the  section  could  be  completely  traversed, 
the  path  could  terminate  inside  of  the  section,  or  the  path  could  include  none  of  the 
statements  in  the  section.  Complete  traversal  of  a  code  section  is  dealt  with  by  the  third, 
fifth  and  sixth  if  clauses  in  Reach,  in  which  the  conditions  for  entry  of  the  code  section 
are  included  in  the  path  conditions.  Partial  traversal  of  a  code  section  is  dealt  with  by 
the  first,  second,  fifth  and  sixth  if  clauses,  where  the  presence  of  the  path  termination  is 
tested,  and  if  present,  the  conditions  for  entry  of  the  code  section  are  included  in  the  path 
conditions.  Non-traversal  of  a  code  section  is  dealt  with  by  the  for  each  loop,  which 
restricts  construction  of  path  conditions  to  the  statements  in  the  path,  and  the  fourth 
if  clause,  which  incorporates  the  loop  exit  conditions  if  the  body  is  not  executed.  Thus 
in  each  case  the  appropriate  conditions  are  included  in  the  path  condition,  so  incorrect 
analysis  by  Reach  is  shown  to  be  a  contradiction. 
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Option  n2  is  a  contradiction  of  the  assumption  that  Cp  has  been  accurately  captured 
by  the  fault  detection  process,  and  hence  may  be  neglected. 

Option  n3  may  occur  iff  at  all  statements  reachable  from  Lp,  the  branches  that  include 
any  member  of  the  set  of  program  output  statements  required  conditions  that  were  found 
to  be  false  in  the  state  derived  from  the  input.  That  implies  that  there  exist  paths  from 
Lp  to  at  least  one  output  statement  o,  but  these  paths  are  not  included  in  the  derivation 
of  the  failure  region  bounds  or  the  conditions  along  the  path  were  incorrectly  analyzed  by 
Propagate.  The  implication  that  the  path  was  not  included  is  in  conflict  with  the  'all 
paths  from  Lp  to  o'  loop  in  DeriveRegion.  The  implication  that  the  conditions  along 
the  path  were  incorrectly  analyzed  by  Propagate  may  be  shown  to  be  a  contradiction 
through  argument  paralleling  the  preceding  discussion  of  Reach. 

Option  n4  may  occur  iff  at  some  point  on  the  path  selected  by  the  input  after  the 
execution  of  Lp,  all  variables  affected  by  the  fault  and  subsequent  erroneous  calculation 
have  had  their  values  replaced  by  values  resulting  from  calculation  or  reference  with  solely 
unaffected  variables.  This  conflicts  with  the  first  if  inside  the  for  each  loop  in  Prop- 
agate, which,  by  returning  'false',  precludes  input  that  selects  such  paths  from  inclusion 
in  the  failure  region. 

Since  each  possible  cause  of  the  negation  of  Proposition  1  has  been  shown  to  be  a 
contradiction,  that  proposition  is  found  to  be  true.  DeriveRegion  is  thus  proven  to 
generate  necessary  conditions  for  a  fault  to  be  executed  and  lead  to  a  failure. 

The  negation  of  Proposition  2  assumes  that  some  input  exists  that  does  not  satisfy  the 
conditions  generated  by  DeriveRegion  but  does  lead  the  fault  to  produce  some  failure. 
This  may  only  happen  through  one  or  more  of  the  following: 

si  The  input  causes  program  execution  along  a  path  from  an  input  statement 
through  the  fault  location  that  was  not  considered  in  DeriveRegion. 
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s2  The  conditions  derived  from  the  input  cause  the  fault  to  generate  an  error  in  a 
manner  not  considered  in  Cp- 

s3  The  input  causes  program  execution  along  a  path  from  the  fault  location  to  an 
output  statement  that  was  not  considered  in  DeriveRegion. 

Option  si  is  shown  to  be  a  contradiction  through  an  argument  that  parallels  the  refuta- 
tion of  option  nl.  Option  s2  is  a  direct  contradiction  of  the  assumption  that  Cf  has  been 
accurately  captured  by  the  fault  detection  process.  Option  s3  is  shown  to  be  a  contra- 
diction through  an  argument  that  parallels  the  refutation  of  option  n3.  All  three  options 
being  shown  as  contradictions,  DeriveRegion  is  proven  to  generate  sufficient  conditions 
for  a  fault  to  be  executed  and  lead  to  a  failure. 

This  proof  of  necessary  and  sufficient  conditions  is  subject  to  the  assumptions  made 
at  the  start  of  the  section.  These  assumptions  limit  the  applicability  of  the  manual  tech- 
nique to  those  contexts  that  can  reasonably  satisfy  them.  If  the  technique  is  applied  in  a 
context  where  these  assumptions  do  not  hold,  it  may  generate  the  necessary  and  sufficient 
conditions,  but  this  proof  does  not  hold. 

4     Automated  Support  for  Failure  Region  Analysis 

While  the  manual  process  allows  for  the  derivation  of  software  failure  regions,  it  is  too 
mentally  taxing  and  too  slow  for  use  with  large  or  complex  software.  To  facilitate  and 
accelerate  failure  region  derivation,  five  semiautomated  software  tools  have  been  developed 
to  support  the  failure  region  analysis:  REACHER,  WALKER,  FALTER,  SPACER  and  VIEWER. 
A  flow  diagram  of  the  use  of  these  tools  is  in  figure  9. 

These  tools  are  not  intended  to  replace  a  human  analyst,  but  rather  to  support  the 
analyst  in  the  derivation  of  failure  regions.  At  any  point  in  the  analysis  process,  the  analyst 
may  intervene  to  replace  conditions,  change  the  source  code  fragment  being  analyzed,  or 
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Figure  9:  Flow  Diagram  of  Support  Tools 


examine  previous  analysis  results.  At  some  points  in  the  process,  the  analyst  is  required 
to  provide  information  that  is  not  automatically  determinable. 

Two  tools  support  the  reachability  analysis  process.  REACHER  parses  Pascal  programs 
and  builds  a  control  flow  graph.  This  control-flow  graph  is  a  data-structure  representation 
of  the  program  model  given  in  section  3.2.  Using  the  control-flow  graph,  the  conditions 
under  which  each  statement  in  the  program  may  be  reached  may  be  derived. 

REACHER  is  constructed  using  LEX  and  YACC  and  a  specification  for  Pascal  based  on 
the  ISO  standard.  In  a  single  non-interactive  pass  over  the  Pascal  source  code,  REACHER 
creates  a  list  of  blocks  corresponding  to  each  procedure  and  function.  Each  block  contains 
information  such  as  the  declaration  text  for  each  block,  the  sub-blocks  belonging  to  or 
nested  within  each  block  and  a  pointer  tracing  the  block  back  through  the  thread  of  its 
definition. 

Concurrent  with  the  construction  of  the  linked  list  of  blocks  is  the  construction  of  a 
linked  list  of  nodes  that  correspond  to  the  statements  in  the  program.  It  is  this  list  that 
results  in  the  control-flow  graph.  For  example,  consider  the  Pascal  fragment  in  Figure  10. 

begin 

readln (x) ; 
if  x   >    3   then 

y    :=   10 
else 

y    :=   20; 
writeln(x/    y) 
end 

Figure  10:  Example  Pascal  Fragment 

The  list  would  contain  nodes  for  begin-end,  the  readln  procedure  call,  the  if- 
then-else,  the  two  assignment  statements  and  the  write  In  procedure  call.  The  nodes 
are  linked  as  shown  in  figure  11,  forming  the  control- flow  graph.  Each  node  is  annotated 
with  the  conditions  under  which  program  flow  may  proceed  to  each  succeeding  statement. 

Once  REACHER  has  constructed  the  initial  control-flow  graph,  annotated  with  the  im- 
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Figure  11:  Node  Linkages  in  Example  Pascal  Fragment 


mediate  branching  conditions,  WALKER  is  used  to  collapse  unneeded  nodes  and  to  collect 
the  immediate  branching  conditions  into  the  node  reachability  conditions.  Unneeded  nodes 
are  those  unrelated  to  the  fault  being  analyzed.  They  are  collapsed  by  merger  of  the  node 
information  in  the  internal  data  structure.  The  collection  of  branching  conditions  into 
reachability  conditions  is  accomplished  by  a  user-guided  loop  [0,1]  traversal  as  described  in 
section  3.4.  By  default  WALKER  will  'and'  together  the  branching  conditions  of  the  nodes 
traversed.  If  the  user  finds  the  resultant  conditions  undesirable,  WALKER  allows  the  user  to 
replace  the  current  condition  with  one  of  the  user's  choosing  at  any  point  in  the  traversal. 
WALKER  is  implemented  in  C  using  the  BSD  Unix  support  libraries. 

FALTER  is  used  to  attach  a  description  of  a  known  fault  to  the  data  structure  cor- 
responding to  the  program  block  in  which  the  fault  is  found.  The  fault  description  cor- 
responds to  the  information  described  in  the  fault  model  in  section  3.2,  including  the 
error-generation  conditions  and  the  list  of  variables  with  values  immediately  contaminated 
by  the  fault.  FALTER  then  translates  the  internal  structure  into  a  LISP  expression  for  sym- 
bolic execution  by  SPACER.  FALTER  is  implemented  in  C  using  the  BSD  Unix  support 
libraries. 

SPACER  supports  the  propagation  analysis  described  in  section  3.5.  At  the  start  of 
execution,  SPACER  prompts  the  analyst  to  select  the  fault  to  be  analyzed  from  the  list 
of  faults  identified  through  FALTER  in  the  current  program.  SPACER  then  symbolically 
executes  the  program,  performing  a  traversal  of  the  program  paths.  When  the  fault  lo- 
cation is  reached,  SPACER  initiates  the  analysis  process  described  in  section  3.5,  using 
the  information  provided  via  FALTER.  At  each  output  statement,  SPACER  reports  the 
current  propagation  conditions  for  the  variables  referenced  in  the  output.  During  this  pro- 
cess, SPACER  requires  user  guidance  to  select  paths  to  traverse  and  to  resolve  complex 
expressions.  SPACER  periodically  reports  to  the  user  the  current  state  of  the  analysis  and 
the  statements  being  traversed.  SPACER  is  implemented  in  Allegro  Common  LISP,  using 
portions  of  the  UNISEX  symbolic  execution  package[7]. 
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While  each  of  the  above  tools  may  be  directly  accessed  by  the  analyst,  a  coordinating 
visual  interface  is  provided  by  the  VIEWER  tool [2].  VIEWER  presents  textual  and  graphic 
information  to  facilitate  the  analyst's  tracking  of  the  analysis  process,  provides  a  mouse- 
driven  command  structure  to  replace  the  text  command  structure  supported  by  the  other 
tools,  deals  with  tool  initiation  and  termination,  eases  consistency  between  tool  usage,  and 
provides  an  extensible  command  structure  to  ease  common  commands.  VIEWER  is  imple- 
mented in  C  using  the  Sun  View  user  interface  libraries.  A  sample  of  VIEWER  operation 
(one  supporting  WALKER  execution)  is  given  in  figure  12. 

5      Conclusions 

This  paper  has  presented  a  new  technique  for  deriving  the  bounding  conditions  for  soft- 
ware failure  regions.  These  conditions  form  the  necessary  and  sufficient  conditions  for  an 
individual  program  fault  to  be  executed  and  to  lead  to  a  failure.  But  the  technique  requires 
that  the  fault  be  known  prior  to  the  analysis.  One  testing  task  in  which  this  knowledge 
would  be  present  is  the  regression  testing  task. 

There  are  two  immediately  obvious  ways  in  which  failure  regions  analysis  aids  regression 
testing.  First,  the  failure  region  offers  a  framework  within  which  statistical  sampling 
techniques  can  be  used  to  evaluate  fault  correction.  Testers,  if  they  know  the  limits  on 
the  useful  test  cases,  can  intelligently  plan  regression  tests,  rather  than  iterate  previous 
testing.  Research  is  in  progress  on  detailed  regression-testing  techniques  based  on  sampling 
of  failure  regions. 

The  second  way  failure  regions  analysis  aids  regression  testing  is  by  focusing  detailed 
attention  on  the  fault.  As  a  review  method,  failure  regions  analysis  may  allow  the  analyst 
to  identify  if  a  proposed  correction  is  inadequate  prior  to  run-time  testing.  As  a  planning 
method,  failure  regions  analysis  may  allow  maintenance  personnel  further  insight  on  the 
fault  as  they  consider  alternative  corrections  to  the  fault. 
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Figure  12:  Sample  of  VI EWER  Operation 


Finally,  knowledge  of  the  failure  region  bounds  allows  examination  of  a  variety  of  issues 
relating  to  the  applicability  of  software  testing  techniques.  Research  is  in  progress  that 
applies  software  failure  region  analysis  to  a  population  of  software  with  known  faults  and 
examines  these  issues.  In  summary,  software  failure  regions  analysis  offers  both  assistance 
in  portions  of  the  software  test  effort  and  insight  into  a  variety  of  phenomena  relating  to 
faults  and  their  detection. 
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